Quantization and the method of k -means
نویسنده
چکیده
T HE THEORY developed in the statistical literature for the method of k-means can be applied to the study of optimal k-level vector quantizers. In this paper, I describe some of this theory, including a consistency theorem (Section II) and a central lim it theorem (Section IV) for k-means cluster centers. These results help to explain the behavior of optimal vector quantizers constructed from long stretches of ergodic training sequences. I also offer a new proof (Section III) for the consistency theorem, based on an identification of the optimal quantizer with the measure that m inimizes a Vasershtein-like distance between an emp irical measure and a collection of discrete measures corresponding to k-level quantizers. By a k-level quantizer I mean a map 4 from some Euclidean space IWd into a subset {a,, a2;. .,a,} of itself. Such a map can be used to convert a d-dimensional input signal X into an output q(X) that can take on at most k different values. An optimal k-level quantizer for a probability distribution P on lRd m inimizes the distortion, as measured by the mean-squared error P ] X q(X) 12, for a random vector X with distribution P. (Instead of the traditional symbol lE, I use P to denote expectations as well as probabilities. A similar convention applies for expectations with respect to the probability measure P.) O f course, this makes sense only if the expected squared Euclidean distance P ( x I2 is finite, or, equivalently, the L2 norm II XII = 2 PIXI ) ‘/* of the corresponding random vector X is finite. Such a constraint will remain on P throughout this paper. Searching for an optimal k-level quantizer for P is equivalent to the k-means problem: find a set A = { aI, a2,f ep ak} of cluster centers to m inimize the within cluster sum of squares
منابع مشابه
Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملNGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map
Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...
متن کاملDetection of perturbed quantization (PQ) steganography based on empirical matrix
Perturbed Quantization (PQ) steganography scheme is almost undetectable with the current steganalysis methods. We present a new steganalysis method for detection of this data hiding algorithm. We show that the PQ method distorts the dependencies of DCT coefficient values; especially changes much lower than significant bit planes. For steganalysis of PQ, we propose features extraction from the e...
متن کاملA Method to Reduce Effects of Packet Loss in Video Streaming Using Multiple Description Coding
Multiple description (MD) coding has evolved as a promising technique for promoting error resiliency of multimedia system in real-time application programs over error-prone communicational channels. Although multiple description lattice vector quantization (MDCLVQ) is an efficient method for transmitting reliable data in the context of potential error channels, this method doesn’t consider disc...
متن کاملDisguised Face Recognition by Using Local Phase Quantization and Singular Value Decomposition
Disguised face recognition is a major challenge in the field of face recognition which has been taken less attention. Therefore, in this paper a disguised face recognition algorithm based on Local Phase Quantization (LPQ) method and Singular Value Decomposition (SVD) is presented which deals with two main challenges. The first challenge is when an individual intentionally alters the appearance ...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Information Theory
دوره 28 شماره
صفحات -
تاریخ انتشار 1982